Sample Complexity of Learning Mahalanobis Distance Metrics
نویسندگان
چکیده
Metric learning seeks a transformation of the feature space that enhances prediction quality for a given task. In this work we provide PAC-style sample complexity rates for supervised metric learning. We give matching lowerand upper-bounds showing that sample complexity scales with the representation dimension when no assumptions are made about the underlying data distribution. In addition, by leveraging the structure of the data distribution, we provide rates fine-tuned to a specific notion of the intrinsic complexity of a given dataset, allowing us to relax the dependence on representation dimension. We show both theoretically and empirically that augmenting the metric learning optimization criterion with a simple norm-based regularization is important and can help adapt to a dataset’s intrinsic complexity yielding better generalization, thus partly explaining the empirical success of similar regularizations reported in previous works.
منابع مشابه
An investigation on scaling parameter and distance metrics in semi-supervised Fuzzy c-means
The scaling parameter α helps maintain a balance between supervised and unsupervised learning in semi-supervised Fuzzy c-Means (ssFCM). In this study, we investigated the effects of different α values, 0.1, 0.5, 1 and 10 in Pedrycz and Waletsky’s ssFCM with various amounts of labelled data, 10%, 20%, 30%, 40%, 50% and 60% and three distance metrics, Euclidean, Mahalanobis and kernel-based on th...
متن کاملmetricDTW: local distance metric learning in Dynamic Time Warping
We propose to learn multiple local Mahalanobis distance metrics to perform knearest neighbor (kNN) classification of temporal sequences. Temporal sequences are first aligned by dynamic time warping (DTW); given the alignment path, similarity between two sequences is measured by the DTW distance, which is computed as the accumulated distance between matched temporal point pairs along the alignme...
متن کاملActive Metric Learning for Supervised Classification
Clustering and classification critically rely on distance metrics that provide meaningful comparisons between data points. We present mixedinteger optimization approaches to find optimal distance metrics that generalize the Mahalanobis metric extensively studied in the literature. Additionally, we generalize and improve upon leading methods by removing reliance on pre-designated “target neighbo...
متن کاملInvestigating Distance Metrics in Semi-supervised Fuzzy c-Means for Breast Cancer Classification
In previous work, semi-supervised Fuzzy c-means (ssFCM) was used as an automatic classification technique to classify the Nottingham Tenovus Breast Cancer (NTBC) dataset as no method to do this currently exists. However, the results were poor when compared with semi-manual classification. It is known that the NTBC data is highly non-normal and it was suspected that this affected the poor result...
متن کاملFeature Selection in Big Data by Using the enhancement of Mahalanobis–Taguchi System; Case Study, Identifiying Bad Credit clients of a Private Bank of Islamic Republic of Iran
The Mahalanobis-Taguchi System (MTS) is a relatively new collection of methods proposed for diagnosis and forecasting using multivariate data. It consists of two main parts: Part 1, the selection of useful variables in order to reduce the complexity of multi-dimensional systems and part 2, diagnosis and prediction, which are used to predict the abnormal group according to the remaining us...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015